C Pills - Comparing structs

Things you need to be aware of when comparing structs.
Last modified

The C programming language does not define the equality operator == for compound types (structs). If you want to compare two variables of the same compound type, you have to explicitly compare each field:

struct my_vec { int x, y; };
struct my_vec v1 = {0}, struct my_vec v2 = {0};

if (v1 == v2);                     // Not allowed
if (v1.x == v2.x && v1.y == v2.y); // You have to do this

This is error-prone, annoying to type and verbose, so it is tempting to just call memcmp. That's what it was made for, right?

Yes. And no. When the compiler generates code for a struct, it doesn't only make sure that it is large enough for all members to fit; it also has to place each field at its correct alignment.

Suppose you have a struct that looks like this:

struct my_type {
    int32_t x;
    int64_t y;
};

If we just look at the size of individual fields, it looks like the total size should be 4+8=12 bytes.
What we don't see is that the compiler wants to place 8-byte fields as 8-byte aligned (that is, it wants the bottom log_2(8)=3 bits of the offset to be zeros). Assuming that the struct starts at address 0x0, the first offset after that to have 3 zero bits at the bottom is 0x8, exactly 4 bytes after the end of the length field. This will leave an empty, unnamed 4-byte gap between x and y.

x y
The struct with no padding
x padding y
How it really is

What does this gap contain? It could be anything. The programmer can't read from or write to it, unless they use pointer arithmetic. Compilers aren't required to initialize this space with anything. Yes, even if you explicityly do zero-initialization of the variable like this:

// Compiler still allowed to do whatever it wants with those padding bytes
struct my_type a = {0};

This is generally not a problem, but it becomes one if two variables of this compound type are compared with memcmp: even if both fields of the variables are identical, memcmp could still return non-zero if it finds those padding bytes to be different.

0xDE 0x2A 0xBE 0xEF 0xDE 0x45 0xBE 0xEF
Comparing with memcmp

This could be the intended behaviour, but it usually isn't.

This leaves the programmer with two options, neither of which is optimal. The first is to compare each field individually, and wrap that into a function when it starts to get repeptitive. The second is to make sure that no implicit padding bytes are inserted in the struct by making them explicit:

struct my_type {
    int32_t x;
    int32_t _pad0[0]; // Exactly fits the gap left between length and data
    int64_t y;
};

struct my_type s = {0}; // Now the compiler is required to cooperate

Of course this could be tiresome if working with data types subject to frequent changes.